Approximate String Matching with Reduced Alphabet
نویسندگان
چکیده
We present a method to speed up approximate string matching by mapping the factual alphabet to a smaller alphabet. We apply the alphabet reduction scheme to a tuned version of the approximate Boyer– Moore algorithm utilizing the Four-Russians technique. Our experiments show that the alphabet reduction makes the algorithm faster. Especially in the k-mismatch case, the new variation is faster than earlier algorithms for English data with small values of k.
منابع مشابه
Improved Approximate Multiple Pattern String Matching using Consecutive Q Grams of Pattern
String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. This problem is fundamental in computer Science and is the basic need of many applications such as text retrieval, symbol manipulation, computational biology, data mining, and network security. Bit parallelism method is used for increasing the proce...
متن کاملOn-line Approximate String Matching in Natural Language
We consider approximate pattern matching in natural language text. We use the words of the text as the alphabet, instead of the characters as in traditional string matching approaches. Hence our pattern consists of a sequence of words. From the algorithmic point of view this has several advantages: (i) the number of words is much less than the number of characters, which in effect means shorter...
متن کاملApproximate Multiple Pattern String Matching using Bit Parallelism: A Review
String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. Approximate String Matching involves the detection of correct patterns along with the detection of some wrong patterns inside the text. Bit Parallelism is a feature that can be used to detect patterns inside the text and is reported to result in mor...
متن کاملSublinear Approximate String Matching
The present paper deals with the subject of approximate string matching and demonstrates how Chang and Lawler [CL94] conceived a new sublinear time algorithm out of ideas that had previously been known. The problem is to find all locations in a text of length n over a b-letter alphabet where a pattern of length m occurs with up to k differences (substitutions, insertions, deletions). The algori...
متن کاملApproximate String Matching using Within-word Parallelism
Given a text string, a pattern string, and an integer k, the problem of approximate string matching with k differences is to find all substrings of the text string whose edit distance from the pattern string is less than k. The edit distance between two string is defined as the minimum number of differences, where a difference can be a substitution, insertion, or deletion of a single character....
متن کامل